Assignment

This assignment is aim to solve the problems of for Mini Challenge 2

LI NAN https://www.linkedin.com/in/li-nan-63b9251a6/
07-12-2021

1.Data Preparation

1.1 Global Settings

The global settings of R code chunks in this post is set as follows.

1.2 R Packages Installation

The following code input is to prepare for R Packages Installation.

packages = c('DT', 'ggiraph', 'plotly', 'tidyverse','dplyr','readr','hrbrthemes')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

1.3 Data Import

The following code is to import raw data sets from Mini Challenge2(“car-assignment.csv”,“cc_data.csv”,“gps.csv”,“loyalty_data.csv”).

credit_debit <- read_csv("data/cc_data.csv")
loyalty_data <- read_csv("data/loyalty_data.csv")
car_assignment <- read_csv("data/car_assignments.csv")
GPS <- read_csv("data/gps.csv")
glimpse(credit_debit)
Rows: 1,490
Columns: 4
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp  <chr> "1/8/2014", "1/8/2014", "1/14/2014", "1/9/2014", ~
$ location   <chr> "Carlyle Chemical Inc.", "Carlyle Chemical Inc.",~
$ price      <dbl> 4983.52, 4901.88, 4898.39, 4792.50, 4788.22, 4742~
$ loyaltynum <chr> "L8477", "L5756", "L2769", "L3317", "L8477", "L57~
head(loyalty_data)
# A tibble: 6 x 4
  timestamp location               price loyaltynum
  <chr>     <chr>                  <dbl> <chr>     
1 1/8/2014  Carlyle Chemical Inc.  4984. L8477     
2 1/8/2014  Carlyle Chemical Inc.  4902. L5756     
3 1/14/2014 Abila Airport          4898. L2769     
4 1/9/2014  Abila Airport          4792. L3317     
5 1/15/2014 Maximum Iron and Steel 4788. L8477     
6 1/16/2014 Nationwide Refinery    4743. L5756     

2.Tasks and Questions for Mini-Challenge2

2.1 Q1 Intruoduction

Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?

2.2 Data Preparation for Q1

After glimpsing data structure of credit and loyalty card data, the heat map is a good way to visualize the most population locations and its population time.To create this graph,the data aggregation of loyalty card is needed.

loyalty_data$count_event=1

aggregate_dataset <- loyalty_data %>% 
    group_by(timestamp,location) %>% 
    summarize(Frequency = sum(count_event))
head(aggregate_dataset)
# A tibble: 6 x 3
# Groups:   timestamp [1]
  timestamp location               Frequency
  <chr>     <chr>                      <dbl>
1 1/10/2014 Abila Zacharo                  7
2 1/10/2014 Albert's Fine Clothing         1
3 1/10/2014 Bean There Done That           5
4 1/10/2014 Brew've Been Served           14
5 1/10/2014 Brewed Awakenings              3
6 1/10/2014 Carlyle Chemical Inc.          2
Adjustment of Date Type and create a new column named“Day”
aggregate_dataset$timestamp <- as.Date(aggregate_dataset$timestamp, "%m/%d/%Y")
aggregate_dataset$Day <- format(aggregate_dataset$timestamp, format="%d")
head(aggregate_dataset)
# A tibble: 6 x 4
# Groups:   timestamp [1]
  timestamp  location               Frequency Day  
  <date>     <chr>                      <dbl> <chr>
1 2014-01-10 Abila Zacharo                  7 10   
2 2014-01-10 Albert's Fine Clothing         1 10   
3 2014-01-10 Bean There Done That           5 10   
4 2014-01-10 Brew've Been Served           14 10   
5 2014-01-10 Brewed Awakenings              3 10   
6 2014-01-10 Carlyle Chemical Inc.          2 10   
new column: text for tooltip
aggregate_dataset <- aggregate_dataset %>%
  mutate(text = paste0("Location: ", location, "\n", "Day of January: ", Day, "\n", "Frequency: ",Frequency))
Heat map
p <- ggplot(data = aggregate_dataset, aes(x=Day, y=location,fill=Frequency,text=text)) + 
  geom_tile() +
  scale_fill_gradient(low="light blue", high="dark blue") +
  theme_ipsum()

p <- p + theme(axis.text.y = element_text(size = 8))

ggplotly(p, tooltip="text")
Create Loyalty Card Data Table
DT::datatable(aggregate_dataset)